<#>MIT Project: From Lines to Networks#> <##>Predicting HDB Resale Prices with Linear Regression and MLP##>
Mathematical Investigative Task (MIT) 2026 Hwa Chong Institution | H2 Mathematics
<##>Project Overview##>
This project demonstrates the mathematical foundations of machine learning by implementing Linear Regression and a Multi-Layer Perceptron (MLP) from scratch using only NumPy. We apply these models to predict HDB resale prices in Singapore, showcasing real-world applications of H2 Mathematics concepts.
<###>Theme###> SDG 11: Sustainable Cities โ Understanding housing affordability through data-driven analysis
<##>Mathematical Concepts Demonstrated##>
<###>Linear Regression###>
<##>Dataset##>
Source: Singapore Government Data (data.gov.sg) File: Resale flat prices based on registration date from Jan-2017 onwards Size: 229,273 transactions
<###>Features Engineered###> | Feature | Description | Mathematical Role | |---------|-------------|-------------------| | floor_area_sqm | Flat size in square meters | Input variable | | remaining_lease_years | Years left on lease | Input variable | | storey_mid | Middle of storey range | Input variable | | town_* | One-hot encoded towns | Categorical features | | flat_* | One-hot encoded flat types | Categorical features |
<##>Results##>
<###>Linear Regression on Real HDB Data###>
<###>Why Linear Regression Works Well###> HDB pricing has mostly linear relationships:
*MLP performance depends heavily on hyperparameters. With proper tuning, it can match or exceed linear regression on this dataset.
<##>Project Structure##>
~/MIT_Project/
โโโ data/
โ โโโ Resale flat prices based on registration date from Jan-2017 onwards.csv (229K rows)
โ โโโ [4 other CSV files]
โโโ models/
โ โโโ linear_regression.py # Linear regression from scratch
โ โโโ mlp.py # MLP from scratch
โ โโโ data_loader.py # HDB data preprocessing
โ โโโ train_hdb_linear.py # Train LR on HDB data
โ โโโ train_hdb_comparison.py # Compare LR vs MLP
โ โโโ linear_regression_params.json # Saved model weights
โ โโโ hdb_comparison_results.json # Comparison results
โโโ manim/
โ โโโ mit_animations.py # Manim animation scripts
โ โโโ create_animations.py # Matplotlib fallback
โโโ notebooks/
โ โโโ [For Jupyter exploration]
โโโ README.md # This file
<##>How to Run##>
<###>1. Linear Regression Demo###>
cd ~/MIT_Project/models
python3 linear_regression.py
<###>2. Train on Real HDB Data###>
python3 train_hdb_linear.py
<###>3. Compare LR vs MLP###>
python3 train_hdb_comparison.py
<###>4. MLP on Synthetic Data###>
python3 mlp.py
<##>Animation Storyboard##>
<###>Scene 1: Title###> "From Lines to Networks: How Machines Learn to Predict HDB Resale Prices"
<###>Scene 2: Data Visualization###> Scatter plot of HDB transactions (floor area vs. price)
<###>Scene 3: Linear Regression Model###>
<###>Scene 5: Linear Regression Result###> Best-fit line on HDB data with Rยฒ = 0.52
<###>Scene 6: The Non-Linear Problem###> Linear model failing on curved data
<###>Scene 7: MLP Architecture###> Network diagram: Input โ Hidden โ Output
<###>Scene 8: Forward Pass###> Data flows through network: zโปยนโฝ = Wโปยนโฝx + bโปยนโฝ, aโปยนโฝ = ReLU(zโปยนโฝ)
<###>Scene 9: Backpropagation###> Chain rule visualization: โL/โw = โL/โลท ยท โลท/โz ยท โz/โw
<###>Scene 10: Comparison###> Side-by-side: Linear Regression (Rยฒ=0.52) vs MLP (Rยฒ=0.97 on non-linear)
<###>Scene 11: Conclusion###> Mathematics powers machine learning: statistics, calculus, linear algebra
<##>Key Takeaways##>
<##>AI Use Declaration##>
AI Tool Used: ChatGPT (Claude Code / Hermes Agent)
Purpose:
<##>References##>
Submitted: Term 2 Week 10, 2026 Group Members: [Your names here] Class: 26S6B